rl controller
A Learning-based Control Methodology for Transitioning VTOL UAVs
Lin, Zexin, Zhong, Yebin, Wan, Hanwen, Cheng, Jiu, Sun, Zhenglong, Ji, Xiaoqiang
Transition control poses a critical challenge in Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL UAV) development due to the tilting rotor mechanism, which shifts the center of gravity and thrust direction during transitions. Current control methods' decoupled control of altitude and position leads to significant vibration, and limits interaction consideration and adaptability. In this study, we propose a novel coupled transition control methodology based on reinforcement learning (RL) driven controller. Besides, contrasting to the conventional phase-transition approach, the ST3M method demonstrates a new perspective by treating cruise mode as a special case of hover. We validate the feasibility of applying our method in simulation and real-world environments, demonstrating efficient controller development and migration while accurately controlling UAV position and attitude, exhibiting outstanding trajectory tracking and reduced vibrations during the transition process.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- North America > Canada > British Columbia > Vancouver (0.05)
- Europe > Spain > Galicia > Madrid (0.04)
- (3 more...)
Plasma Shape Control via Zero-shot Generative Reinforcement Learning
Wu, Niannian, Li, Rongpeng, Yang, Zongyu, Xiao, Yong, Wei, Ning, Chen, Yihang, Li, Bo, Zhao, Zhifeng, Zhong, Wulyu
Traditional PID controllers have limited adaptability for plasma shape control, and task-specific reinforcement learning (RL) methods suffer from limited generalization and the need for repetitive retraining. To overcome these challenges, this paper proposes a novel framework for developing a versatile, zero-shot control policy from a large-scale offline dataset of historical PID-controlled discharges. Our approach synergistically combines Generative Adversarial Imitation Learning (GAIL) with Hilbert space representation learning to achieve dual objectives: mimicking the stable operational style of the PID data and constructing a geometrically structured latent space for efficient, goal-directed control. The resulting foundation policy can be deployed for diverse trajectory tracking tasks in a zero-shot manner without any task-specific fine-tuning. Evaluations on the HL-3 tokamak simulator demonstrate that the policy excels at precisely and stably tracking reference trajectories for key shape parameters across a range of plasma scenarios. This work presents a viable pathway toward developing highly flexible and data-efficient intelligent control systems for future fusion reactors.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
Safe Reinforcement Learning-Based Vibration Control: Overcoming Training Risks with LQR Guidance
Thorat, Rohan Vitthal, Singh, Juhi, Nayek, Rajdip
Structural vibrations induced by external excitations pose significant risks, including safety hazards for occupants, structural damage, and increased maintenance costs. While conventional model-based control strategies, such as Linear Quadratic Regulator (LQR), effectively mitigate vibrations, their reliance on accurate system models necessitates tedious system identification. This tedious system identification process can be avoided by using a model-free Reinforcement learning (RL) method. RL controllers derive their policies solely from observed structural behaviour, eliminating the requirement for an explicit structural model. For an RL controller to be truly model-free, its training must occur on the actual physical system rather than in simulation. However, during this training phase, the RL controller lacks prior knowledge and it exerts control force on the structure randomly, which can potentially harm the structure. To mitigate this risk, we propose guiding the RL controller using a Linear Quadratic Regulator (LQR) controller. While LQR control typically relies on an accurate structural model for optimal performance, our observations indicate that even an LQR controller based on an entirely incorrect model outperforms the uncontrolled scenario. Motivated by this finding, we introduce a hybrid control framework that integrates both LQR and RL controllers. In this approach, the LQR policy is derived from a randomly selected model and its parameters. As this LQR policy does not require knowledge of the true or an approximate structural model the overall framework remains model-free. This hybrid approach eliminates dependency on explicit system models while minimizing exploration risks inherent in naive RL implementations. As per our knowledge, this is the first study to address the critical training safety challenge of RL-based vibration control and provide a validated solution.
Appendix
A full workflow of channel number search with the proposed transitionary APS is shown in Algorithm 1. The overall procedure consists of two stages. To prove Theorem 3.1, we first show the case of two candidate decision Finally, to prove Theorem 3.1, we only need to extend Lemma 1 to the case of multiple candidate decisions. Here we cover more details in the design. We adopt policy gradient to maximize the reward function.
42cd63cb189c30ed03e42ce2c069566c-AuthorFeedback.pdf
We sincerely thank all reviewers for their constructive comments. We hope this would shed some light on a better understanding of parameter sharing in NAS. We sincerely appreciate your recognition of our technical contributions. (Line 181). Meanwhile, as you pointed out, different optimization of APS would be interesting to explore in the future.
Reinforcement Learning Based Traffic Signal Design to Minimize Queue Lengths
Nandakumar, Anirud, Banerjee, Chayan, Vanajakshi, Lelitha Devi
Abstract--Efficient traffic signal control (TSC) is crucial for reducing congestion, travel delays, pollution, and for ensuring road safety. Traditional approaches, such as fixed signal control and actuated control, often struggle to handle dynamic traffic patterns. In this study, we propose a novel adaptive TSC framework that leverages Reinforcement Learning (RL), using the Proximal Policy Optimization (PPO) algorithm, to minimize total queue lengths across all signal phases. The challenge of efficiently representing highly stochastic traffic conditions for an RL controller is addressed through multiple state representations, including an expanded state space, an autoencoder representation, and a K-Planes-inspired representation. The proposed algorithm has been implemented using the Simulation of Urban Mobility (SUMO) traffic simulator and demonstrates superior performance over both traditional methods and other conventional RL-based approaches in reducing queue lengths. The best performing configuration achieves an approximately 29% reduction in average queue lengths compared to the traditional Webster method. Furthermore, comparative evaluation of alternative reward formulations demonstrates the effectiveness of the proposed queue-based approach, showcasing the potential for scalable and adaptive urban traffic management. I. INTRODUCTION Traffic signal control (TSC) is a crucial problem that needs to be addressed to manage traffic flows, ensure road safety, reduce delays, and increase efficiency and social benefits.
- North America > United States (0.14)
- Oceania > Australia > Queensland > Brisbane (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > India (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation
Xu, Haocheng, Zhang, Haodong, Chen, Zhenghan, Xiong, Rong
To support humanoid robots in performing manipulation tasks, it is essential to study stable standing while accommodating upper-body motions. However, the limited controllable range of humanoid robots in a standing position affects the stability of the entire body. Thus we introduce a reinforcement learning based framework for humanoid robots to imitate human upper-body motions while maintaining overall stability. Our approach begins with designing a retargeting network that generates a large-scale upper-body motion dataset for training the reinforcement learning (RL) policy, which enables the humanoid robot to track upper-body motion targets, employing domain randomization for enhanced robustness. To avoid exceeding the robot's execution capability and ensure safety and stability, we propose an Executable Motion Prior (EMP) module, which adjusts the input target movements based on the robot's current state. This adjustment improves standing stability while minimizing changes to motion amplitude. We evaluate our framework through simulation and real-world tests, demonstrating its practical applicability.
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)